home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
PsL Monthly 1993 December
/
PSL Monthly Shareware CD-ROM (December 1993).iso
/
prgmming
/
dos
/
c
/
globber.exe
/
MATCH.LST
< prev
next >
Wrap
File List
|
1991-02-24
|
16KB
|
544 lines
02-24-91
This is V1.01 of REGEX Globber.
02-22-91 Seattle, WA
Hmm. Choke. (Foot in mouth). After griping about buggy routines and
literally seconds after posting this code the first time, I received
a wonderful new test evaluation tool which allows you to perform
coverage analysis during testing. Sure enough I found that about
25% of the paths in the program were never traversed in my current
test battery. After swallowing my (overly large) pride and coming
up with a test battery which covered the entire path of the program
I found a couple of minor logic bugs involving literal escapes (\)
within other patterns (ie [..] and * sequences). I have repackaged
these routines and included also the makefile I use and the test
battery I use to make things a bit easier.
jbk
02-20-91 Seattle, WA
Here is a *IX wildcard globber I butchered, hacked and cajoled together
after seeing and hearing about and becoming disgusted with several similar
routines which had one or more of the following attributes: slow, buggy,
required large levels of recursion on matches, required grotesque levels
of recursion on failing matches using '*', full of caveats about usability
or copyrights.
I submit this without copyright and with the clear understanding that
this code may be used by anyone, for any reason, with any modifications
and without any guarantees, warrantee or statements of usability of any
sort.
Having gotten those cow chips out of the way, these routines are fairly
well tested and reasonably fast. I have made an effort to fail on all
bad patterns and to quickly determine failing '*' patterns. This parser
will also do quite a bit of the '*' matching via quick linear loops versus
the standard blind recursive descent.
This parser has been submitted to profilers at various stages of development
and has come through quite well. If the last millisecond is important to
you then some time can be shaved by using stack allocated variables in
place of many of the pointer follows (which may be done fairly often) found
in regex_match and regex_match_after_star (ie *p, *t).
No attempt is made to provide general [pat,pat] comparisons. The specific
subcases supplied by these routines is [pat,text] which is sufficient
for the large majority of cases (should you care).
Since regex_match may return one of three different values depending upon
the pattern and text I have made a simple shell for convenience (match()).
Also included is an is_pattern routine to quickly check a potential pattern
for regex special characters. I even placed this all in a header file for
you lazy folks!
Having said all that, here is my own reinvention of the wheel. Please
enjoy it's use and I hope it is of some help to those with need ....
jbk
==================== BEGIN of Listing ====================
Checksum: 2543766536 (verify or update this with "brik")
==================== MATCHMAK ====================
#
#
# Makefile for match.c
#
# Created 01-20-91 JBK
# Last Modified 02-13-91 JBK
#
#
CC = cl
#
# This is FLAGS for optimized version
#FLAGS = /c /AL /G0 /F 0FFF /Ox /W4
#
# This is FLAGS for optimized version with main
#FLAGS = /D TEST /AL /G0 /F 0FFF /Ox /W4
#
# This is FLAGS for debugging versions with main
FLAGS = /D TEST /AL /G0 /F 0FFF /Od /W4 /Zi /qc
match.exe: match.c match.h
$(CC) $(FLAGS) match.c
==================== MATCHTST.BAT ====================
@echo The following tests should match
match test? testy
match test* test
match tes*t test
match *test test
match t*s*t test
match t*s*t tesseract
match t?s? test
match ?s*t psyot
match [a-z]s*t asset
match s[!gh]t set
match t[a-ce]st test
match tea[ea-c]up teacup
match [a-fh-z]* jack
match \i\** i*hello
match [\[-\]] [
match [a-z\\] \
match [a-z%_] b
match [\]] ]
match \i?* itch
match \i?* it
match ?*?*?t test
match ?*?*?*?* test
match *\]*\**\?*\[ ]this*is?atest[
@echo The following tests should fail
match hello
match test test
match \ test
match t*s*t texxeract
match t?st tst
match test? test
match s[!e]t set
match [] ]
match [ [
match [\[-\] [
match [a atest
match [a- atest
match [a-z atest
match [a-]* atest
match [a-fh-z jack
match [a-fh-z\] jack
match [a-fh-z] jack
match ?*?*?t*? test
match *????? test
==================== MATCH.H ====================
/*
EPSHeader
File: match.h
Author: J. Kercheval
Created: Sat, 01/05/1991 22:27:18
*/
/*
EPSRevision History
J. Kercheval Wed, 02/20/1991 22:28:37 Released to Public Domain
*/
/*
Wildcard Pattern Matching
*/
#ifndef BOOLEAN
# define BOOLEAN int
# define TRUE 1
# define FALSE 0
#endif
/*----------------------------------------------------------------------------
*
* Match the pattern PATTERN against the string TEXT;
* return TRUE if it matches, FALSE otherwise.
*
* A match means the entire string TEXT is used up in matching.
*
* In the pattern string:
* `*' matches any sequence of characters
* `?' matches any character
* [SET] matches any character in the specified set,
* [!SET] or [^SET] matches any character not in the specified set.
*
* Note: the standard regex character '+' (one or more) should by
* simulated by using "?*" which is equivelant here.
*
* A set is composed of characters or ranges; a range looks like
* character hyphen character (as in 0-9 or A-Z). [0-9a-zA-Z_] is the
* minimul set of characters allowed in the [..] pattern construct.
* Other characters are allowed (ie. 8 bit characters) if your system
* will support them.
*
* To suppress the special syntactic significance of any of `[]*?!^-\',
* and match the character exactly, precede it with a `\'.
*
----------------------------------------------------------------------------*/
BOOLEAN match (char *pattern, char *text);
/*----------------------------------------------------------------------------
*
* Return TRUE if PATTERN has any special wildcard characters
*
----------------------------------------------------------------------------*/
BOOLEAN is_pattern (char *pattern);
==================== MATCH.C ====================
/*
EPSHeader
File: match.c
Author: J. Kercheval
Created: Sat, 01/05/1991 22:21:49
*/
/*
EPSRevision History
J. Kercheval Wed, 02/20/1991 22:29:01 Released to Public Domain
J. Kercheval Fri, 02/22/1991 15:29:01 fix '\' bugs (two :( of them)
*/
/*
Wildcard Pattern Matching
*/
#include "match.h"
#define ABORT 2 /* end of search indicator */
BOOLEAN regex_match_after_star (char *pattern, char *text);
/*----------------------------------------------------------------------------
*
* Return TRUE if PATTERN has any special wildcard characters
*
----------------------------------------------------------------------------*/
BOOLEAN is_pattern (char *p)
{
while ( *p ) {
switch ( *p++ ) {
case '?':
case '*':
case '[':
return TRUE;
case '\\':
if ( !*p++ ) return FALSE;
}
}
return FALSE;
}
/*----------------------------------------------------------------------------
*
* Match the pattern PATTERN against the string TEXT;
* return TRUE if it matches, FALSE otherwise.
*
* A match means the entire string TEXT is used up in matching.
*
* In the pattern string:
* `*' matches any sequence of characters
* `?' matches any character
* [SET] matches any character in the specified set,
* [!SET] or [^SET] matches any character not in the specified set.
*
* Note: the standard regex character '+' (one or more) should by
* simulated by using "?*" which is equivelant here.
*
* A set is composed of characters or ranges; a range looks like
* character hyphen character (as in 0-9 or A-Z). [0-9a-zA-Z_] is the
* minimul set of characters allowed in the [..] pattern construct